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Abstract —Traditional approaches for handwritten Chinese 
character recognition suffer in classifying similar characters. 
In this paper, we propose to discriminate similar handwritten 
Chinese characters by using weakly supervised learning. Our 
approach learns a discriminative SVM for each similar pair 
which simultaneously localizes the discriminative region of sim¬ 
ilar character and makes the classification. For the first time, 
similar handwritten Chinese character recognition (SHCCR) is 
formulated as an optimization problem extended from SVM. We 
also propose a novel feature descriptor. Gradient Context, and 
apply bag-of-words model to represent regions with different 
scales. In our method, we do not need to select a sized-fixed sub¬ 
window to differentiate similar characters. This “unconstrained” 
property makes our method well adapted to high variance in the 
size and position of discriminative regions in similar handwritten 
Chinese characters. We evaluate our proposed approach over the 
CASIA Chinese character data set and the results show that our 
method outperforms the state of the art. 

Index Terms —Similar handwritten Chinese character recog¬ 
nition, weakly supervised learning, bag-of-words, discriminative 
SVM 


I. Introduction 

As the demand of optical character recognition (OCR) 
applications has increased tremendously in recent years, the 
researches on OCR have paid great attention, especially to 
the unconstrained handwritten Chinese character recognition 
(HCCR). Many progresses have been achieved in HCCR 
in recent decades 0, 0, (D, GD, (m. However, the 
experimental results on certain databases still cannot satisfy 
the requirement of real application or humans recognition 
ability in regard of the accuracy and efficiency. Chinese 
character recognition is a large-scale classification problem 
which involves more than 5000 frequently-used characters and 
no less than 10000 for the whole character set, and there are 
thousands of pairs of similar Chinese characters. Usually, most 
HCCR systems adopt one fits all model, but in fact, the state- 
of-the-art classifiers used are easily confused at classifying 
these similar pairs, which pulls down the overall recognition 
accuracy. Therefore, solving the problem of similar Chinese 
character recognition will bring potential improvements to the 
accuracy of handwritten Chinese character recognition. 

Similar Chinese characters often share the same radical or 
differ from each other in some subtle part, such as a stroke 
or a dot, which we denote as discriminative region (DR) 
in this paper. See Figure for illustration: and are 

different in their left radical; and are different in the 



Figure 1. Discriminative regions for and and 

upper-left stroke. These similar characters are hard to classify 
by using existing approaches because they usually employ a 
global statistical model which may neglect the local detailed 
features that are discriminative for differentiating those similar 
pairs. There are also some literatures proposed to address the 
aforementioned problem. T. F. Gao et al. 0 0 calculate 
a complementary distance on a discriminative vector whose 
discriminability is evaluated by Finear Discriminative Analysis 
(FDA) and compound it with the output of a baseline classifier 
to help differentiate similar characters. K.C. Feung and C.H. 
Feung (Tg proposed to use critical region analysis technique, 
which highlights the critical regions by Fishers discriminant, to 
discriminating one character from another similar character. B. 
Xu et al. also proposed to detect the most critical region 
for each pair of similar characters by using Average Symmetric 
Uncertainty (ASU). In p4| , the similar Chinese characters 
recognition is defined as a Multiple-Instance learning problem, 
Shao et al. employ the Adaboost framework to select the 
discriminative region for each pair of similar characters. Re¬ 
cently, D. Tao et al. [ [25| introduced the discriminative locality 
alignment (DFA) approach to discriminate each character from 
other similar characters. 

The methods proposed in select a sub-window 

with fixed size and position for each pair of similar characters 
which is not applicable because the critical region for each 
pair of handwritten Chinese characters may shift both in 
position and scale due to the writing style. Shao et al. p4| 
alleviated this problem by training a weak classifier for each 
similar pair which adaptively selects the critical region with 
several predefined window scales and locations of each testing 
sample p4| . However, the problem is not fully solved; there 
is still room for further improvement. Moreover, experimental 
results in p2| , | [26| showed that approaches that employ a 
hierarchical treatment of patterns have considerable advan¬ 
tages over one-model-fits-all approaches, not only improving 



recognition accuracy, but also reducing the computational cost. 
So, in this paper, we propose to use two-stage classification 
strategy where we use MQDF (li> a successful classifier 
frequently used in HCCR, as the baseline classifier and learn 
a SVM as the second-level classifier that jointly localizes 
the discriminative region of each pair of similar characters 
and makes the classification. The methods proposed in 
(241 , have to fix the window size because otherwise the 
feature extracted will be in different dimensionality which is 
unacceptable for training the classifier. While in our proposed 
approach, we also remove this constraint on the scale of critical 
region of each similar pair by introducing a novel SIFT-alike 
feature to the discrimination of similar Chinese characters. 

In the rest of this paper, an overview of the proposed 
method is given in section 2. Section 3 introduces the proposed 
SIFT-alike feature descriptor. Section 4 presents the detailed 
learning algorithm for SHCCR. Finally, Section 5 gives the 
experimental results and the analysis of the results, and we 
will come to the conclusion of this paper in section 6 . 

II. System Overview 

In this section,we give a brief overview of our proposed 
HCCR System equipped with SHCCR, see Figure for the 
flow chart. The system mainly consists of two components, 
the MQDF for traditional HCCR and the discriminative SVM 
classifier for SHCCR. The traditional methods for HCCR are 
well developed and widely used in many real systems, we refer 
readers to 0> d), fT5|-p7| for the detailed algorithms. In 
this paper, we focus on SHCCR. 

For the integration of the baseline classifier (MQDF) and 
the SVM classifier for SHCCR, we propose to use logistic 
regression ho{x) to do confidence evaluation GD over the 
outputs of baseline classifier corresponding to the scores of 
the top two candidates. Given a testing sample character d, 
MQDF is used to output the scores ( 51 , 82 ) of the top two 
candidates (ci,C 2 ), where ci yeilds higher probability to be 
the true class, i.e., si > 82 . ( 81 , 82 ) is then fed to the trained 
logistic function to output the confidence ho{si^S 2 ). If the 
confidence he{si^S 2 ) is below an acceptable confidence a, we 
will check whether the top two candidates (ci,C 2 ) is in our 
similar characters set. Discriminative SVM will be applied 
to determine which class d belongs to if a match is found; 
otherwise, d will be classified as ci according to the outputs 
of the baseline classifier. 

HI. Feature Construction for SHCCR 

Scale-Invariant Feature Transform (SIFT) p9| is a very 
popular feature extraction strategy widely adopted in many 
computer vision tasks. However, the original SIFT cannot be 
directly applied to HCCR, because there are many different 
writing style in handwritten Chinese characters, which is more 
complex than variances in rotation and scale. There are also 
proposed works 1^ , | [29| that attempt to modify SIFT and 
make it available for handwritten character recognition, but 
they are designed for the global classifier in HCCR. In this 


MQDF for traditional HCCR 



section, we present our proposed SIFI-alike feature. Gradient 
Context (GC), specially designed for SHCCR. 

Inspired by SIFT and SCIP (D we sample our seed points 
from the points in the external contours of each normalized 
character image and extract the local feature descriptor by 
employing our proposed Gradient Context (a modified version 
of Shape Context ||^) and the bag-of-visual-words (BoVW) 
representation method. Then, we obtain a visual dictionary for 
each similar pair via using the K-means algorithm to capture 
all feature descriptors 0. Finally, we compute a histogram 
which counts the number of each codeword in a selected region 
with random scale. Figure gives a diagram of our proposed 
feature extraction procedure. 

A. Seeds Selection 

In the view of human perception, people can easily recog¬ 
nize characters as long as they are given the contour of the 
character image. Thus, a character can be represented by a 
set of discrete points sampled from the internal or external 
contours of the character. Therefore, in our approach, we 
sample the set of seed points P = {pi,P 2 , • * * ^Pn}^Pi ^ 
as locations of the pixels on the external contours of each 
character detected by an edge detector (e.g. Sobel operator). 
However, there is no need to keep all pixels on the contour, 
so as to save computational resources, we can obtain as good 
approximation to the underlying continuous contour as desired 
by keeping n to be sufficiently large. In this paper, we select 
one pixel from every two consecutive pixels on the contour as 
keypoint. In this way, we can obtain 200 ^ 400 keypoints for 
each character in the similar pairs. 






























Figure 3. Diagram of feature extraction procedure. 

B. Feature Descriptor 

For each seed point we want to calculate a discriminative 
descriptor that can represent its local feature. Chinese charac¬ 
ters are generally composed of strokes representing different 
directions, thus gradient features are more informative than 
other features in HCCR. Many experimental results Gg 
also show that gradient features perform very well in 
handwritten character recognition. Thus in our approach, we 
propose to capture the distribution of the gradients in the 
neighborhood of each seed point via using the log-polar 
histogram bins 0- 

For each character, the numerical computation of its gra¬ 
dients is implemented by employing Sobel operator on each 
pixel of the character image. Then, for each seed point Pi(the 
origin of the log-polar histogram bins), we compute a coarse 
histogram hi of the relative gradient distribution of other seed 
points at a certain neighborhood Ft parameterized by four 
radius {ri}^^^ of pp. 

hi{k) = 'P2Gq, q-Pi&bin{k),q^Pi,q&Q. ( 1 ) 

where Gq is the gradient at point q. We define histogram 
hi as the Gradient Context of point Pi. For discriminating 
similar Chinese characters, the feature extracted should be 
able to represent its locality. Equation Q complies with this 
principle by making the generated feature focused more on 
the points nearby than the points far away. Moreover, the size 
of neighborhood Q should not be too large since otherwise 
the histogram hi will bring information from outside of the 
discriminative region which may influence the discriminability 
of the feature. As Shape Context ||^, we equally divide the 
neighborhood into 8 panels with the same degree size with 
regard to their directions, and 4 spiral bins centering at pi are 
used to separate its neighborhood into 4 pieces with different 
sizes. As a result, there are 32 histogram bins in total, which 
renders the visual vectors to be 32-dimension. 

C. Visual Dictionary Learning 

For each pair of similar handwritten Chinese characters, 
given the collection of the extracted visual vectors from all 
training samples, we learn a visual dictionary by employing 
K-means algorithm to cluster all the visual vectors. Experien- 
tially, the clusters with a too small number of members should 
be further pruned out. Each cluster center is defined as a code¬ 
word of the learned dictionary. Then each 32-dimension visual 
vector can be represented as the index of the closest codeword 


in the learned dictionary, which reduces the computational cost 
substantially. We consider histogram as a robust and compact, 
and yet highly discriminative descriptor. Therefore, we use a 
coarse histogram, which counts the number of the appearance 
of each codeword in the visual dictionary, to represent the 
local features in a sub-window with arbitrary size. 

IV. Discriminative SVM for SHCCR 

Given a pair of similar character sets, A and B. In general, 
A and B can be different with each other in two ways. One 
is the case that A has some parts (strokes) in presence while 
character B has not (e.g. Chinese characters and and 
“”), and vice versa. The other case is that there are different 
radicals or strokes in the same region of both character A 
and B. Eor example, similar Chinese characters and are 
different in there left radical but share an identical right radical 
and differ from each other in their central parts. 
However, in our approach, we define the second category as 
a special case of the first one since the fact that and 
are different in their left radical can also be perceived as 
has the radical while has not. Therefore, for all the 
similar pairs of Chinese characters, our task is to localize the 
most discriminative parts which appear in only one of the two 
similar Chinese characters and distinguish them. Most existing 
methods 0, (0, flSl tackle this task within two 

independent steps: localization of the discriminative region 
and making decision. However, localization separated from the 
recognition will cause information loss. So in this paper, we 
propose to learn a SVM that jointly performs DR localization 
and classification. 

Eor each pair of similar Chinese character sets, we define 
the one of them as positive and the other negative. There 
is no particular rule in deciding which character should be 
positive or negative, in our approach, you can either define 
character A as positive or B as positive. The two similar 
characters have equal priority during training. Then our goal 
is to find a region that exists in the positive class while not 
in the negative class. This is similar to learning to detect 
objects given training examples with weakly labeled (binary) 
data indicating the presence of an object (not location). As our 
problem is transformed into an object detection problem, many 
powerful tools can be applied, such as SVM, multiple-instance 
learning and weakly supervised learning. Inspired by 
we firstly formulate the SHCCR as an optimization problem 
extened from SVM and propose a subgradient algorithm to 
solve the problem. 

A. SVM Learning 

Given a set of training samples of a pair of similar Chi¬ 
nese characters, let D+ = ^d'^} and D~ = 

denote the positive set and negative set, 
respectively, and each sample d is represented by a coarse 
histogram. Our goal of finding the most discriminative region 
is equal to learning a SVM with the maximum margin between 
two classes of data, where the data is represented by the 
feature extracted from all sub-windows of each training sample 

































from U D~. Let ^{di) denotes the set of feature vectors 

extracted from all possible sub-windows of training sample 
di. Then we formulate SHCCR as the following optimization 
problem: 

minimize (2) 

s.t. max {uj^x b} > 1 — 1 < i < n (3) 

xe^idf) 

max {uj^x -h 6} < — 1 -h ^j-\-n 1 < J < m(4) 

xe^(dj) 

where cc is a normal vector with b being the bias, is 
the slack variable that allows some violations in the data 
during the training and C denotes the trade-off coefficient. 
The constraints in the above learning objective suggest that 
there must be a positive sub-window in each sample from the 
positive set D+, while all sub-windows in D~ are supposed 
to be classified as negative which is similar to the support 
vector machine for multiple-instance learning |[T|. By solving 
the above optimization problem, we obtain a soft-margin 
SVM classifier which can simultaneously detect the most 
discriminative region, and at the same time, make the decision. 

Given a testing sample d, we first localize the discriminative 
region by finding the feature vector (representing a certain sub¬ 
window of d) that yields the maximum SVM score: 

X = arg max (uj^x -h b) 
xe^(d) 


solved using the well-developed tools. This can be achieved 
with the help from the following lemma. 

Proposition 1. Let {fi}iei be an arbitrary family of convex 
functions on Then, the pointwise supremum f = sup^^^ fi 
is convex. 

Proof: Refer to 0 for the detailed proof. ■ 

Lemma 1. The following function 

m 

= + (6) 

i=i 

is a convex function of p. 

Proof: Firstly, observe that ip{y, p) = uj^y -f- 6 is a linear 
function, applying Proposition here, it’s straightforward to 

show that hJy) = max < — 1, max ip{y^p) > is convex. 

t ye^(d7) J 

Secondly, f{p) is a convex function of y. This completes the 
proof. ■ 

Lemma [T] indicates that if we fix the sub-window for 
each positive sample in constrains (0, the formulation in 
<0 becomes a convex optimization problem. Therefore, we 
first use the Cutting Plane Algorithm (also used in multiclass 
maximum margin clustering by Zhao et al. (Zhao et al., 2008)) 
to select the most violated feature vector extracted from the 
sub-windows of each positive sample at every iteration. See 
Algorithm for the detailed implementation. 


Then, the classification is made corresponding to the value of 
y = + 6. If ^ > 0, the testing sample d will be classified 

as positive; otherwise negative. 

B. Solution approach to SVM learning 

In this section, we propose to employ the optimization skills 
adopted from the Cutting Plane Algorithm as well as the 
standard convex optimization problem to solve our learning 
objective 0 whose constrains are generally non-convex. 

For the optimization objective (0, the non-convex con¬ 
strains 0 is difficult to handle. To tackle this issue, we first 
rewrite the original formulation as the following unconstrained 
optimization problem: 


min <1 f{^i) + 

i=i 


(5) 


i=l 


where y = (uj^b) and 




Algorithm 1: Cutting Plane Algorithm for Constrains 0 

Data: D+ and D~ 

Result: uj and b 

1 Initialize Vl = 0; 

2 repeat 

3 tv = 0 ; 

4 for 1 < i < n do 

5 Select the most violated feature: 

Xi = arg max {uj^x + b) (7) 

xe^(df) 

L {xi} 

6 Optimize for SVM with constraints (|^: 

(w,6,C) = arg min (jllwH^ + 

S.t. uj^x -Vb >1 — X e < i < n 

max {uj^x + 6} < — 1 -h ^j-\-n 1 < J < ^ 

xe^(dT) 

6^0 l<i<n+m 


g^iy) = max 


- 1 , 


min (/)(x, y) > ; 
xe^(df) \ 


(j)(x^y) — Lu'^x b j until tv < r; 


hjy) max (p{y,y)\; ip{y,y)=uj'^y^b 

[ ye^(dT) J 

Our strategy is to convert the complex learning objective 
0 to the traditional SVM optimization problem which can be 


As is illustrated in Algorithm 0 we make the large con¬ 
straint set 0 to be size-manageable by adding the most 
violated constraint for each positive training sample. This 
algorithm stops when the total violation at one iteration is 







smaller than an acceptable precision. The following theorem 
guarantees that Algorithm converges to a local optimum of 
our learning objective 

Theorem 1. Algorithm |7] stops within finite number of it¬ 
erations and converges to a local optimum of the learning 
objective 

Refer to Appendix for the detailed proof. We proceed to 
solve the convex optimization problem ^ with constrains Q 
under the current constraint set at each iteration. Similarly, 
is equivalent to minimizing the following unconstrained 
optimization problem over 


n m 

r(M) = + + (9) 

i=i j=i 

Based on Lemma r(/i) is a convex function of p. However, 
r(/i) is not differentiable in general and thus the traditional 
gradient descent approach is not applicable to this problem. 
Hence, we adopt the subgradient method to handle it. To begin 
with, we derive the subgradient of T{p) via the following 
lemma: 

Lemma 2. The subgradient of r(/i) is determined by Equa¬ 
tion and Equation O where 1 a A equal to 1 if condition 
A holds and 0 otherwise. 

Refer to Appendix for the detailed proof. Following 
Lemma 1^ we design the subgradient descent algorithm below 
to solve the optimization problem ([8]), where ak is the stepsize. 
We design the step size in the theorem below to guarantee the 
convergence of Algorithmic 


Algorithm 2: Subgradient Descent Algorithm 

1. Initialize k = 1 and ///c = /x; 

2. Repeat the following steps until convergence; 

3. In the step. Fix pk, optimize over and 

y*ik = ^^& 4>{y, yk), for j = 

ve-r’(d-) 

4. Update /i/c+i based on the following equation: 


/^/c+l = M/c + 




dp 




Theorem 2. By choosing ak = the sequences generated 
by Algorithm |C converge to the global optimal solution to the 
optimization problem 

Refer to Appendix for the detailed proof of Theorem 
1C Even though our optimization approach converges, the 
algorithm still needs to search for the feature vector that 
maximizes the SVM score from all possible sub-windows 
of each training sample iteratively. Therefore, the algorithm 
requires a very fast optimal region localization procedure. 
In our approach, we propose two strategies to expedite the 
localization process: one is employing an efficient sub-window 
searching algorithm proposed by Lampert et al. |TT| ; the other 
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Figure 4. Normalized handwritten samples of similar Chinese characters 


is by predefining a sliding step and several scale templates cor¬ 
responding to the structures of Chinese characters. According 
to our experimental results, the second strategy is much more 
efficient and achieves nearly as good performance compared 
to the first one. Moreover, Integral Image can also be used to 
speed up the search. 


V. Experiments 

To evaluate the effectiveness of our proposed algorithms for 
SHCCR, we first compare the our approach with the traditional 
MQDF. Then, we conduct another group of experiments 
to compare the discriminative SVM for SHCCR with two 
baselines, which are the critical region selection by Average 
Symmetric Uncertainty (ASU) and by Multiple-Instance 
Learning (MIL) | |24| respectively. The comparison are made 
based on the same set of similar Chinese characters under 
our Logistic Regression (LR)-based framework. All of our 
experiments are conducted over a large handwritten Chinese 
character database, CASIA, which contains binary images of 
3,755 frequently used characters (in level-1 set of GB2312- 
80), and there are 300 samples in each class. 

A. MQDE Training 

In our approach, 270 out of 300 samples in each class 
are randomly selected to train MQDF, the rest of them are 
used for testing. For each sample image, we employ the 
pseudo-two-dimension bi-moment normalization method (TT) 
and the normalization-cooperated gradient feature (NCGF) 
GD extraction strategy to jointly extract a 512-dimension 
feature vector without explicitly normalizing the image to a 
fixed scale (64 x 64 pixels in our method). Linear Discriminant 
Analysis (LDA) is used to reduce the dimensionality of 
the input features from 512 to 200. It is also worth noting 
that, before inputting the feature vector into the classifier, 
we use the power transformation (also known as Box-Cox 
transformation) strategy p7| to make the feature distribution 
closer to the Gaussian distribution. 

B. Parameters Setup 

First of all, we need to build a set of similar Chinese 
characters in order to train the discriminative SVM. However, 
there is no standard definition of similar Chinese characters, 
because “similar” is an obscure concept heavily depending 
on human perception. In our approach, we define the similar 
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( 10 ) 
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= arg max {uj^y + b} 
ye^(d-) 


( 11 ) 


Table I 

Recognition accuracy of the proposed method with different a 



0.7 

0.8 

0.9 

0.92 

0.94 

0.95 

0.96 

0.97 

0.98 

1.00 

Accuracy (%) 

97.90 

97.95 

98.13 

98.22 

98.28 

98.28 

98.29 

98.25 

98.19 

98.05 


Chinese characters set in the view of “machine perception”. 
The machine (baseline classifier) perceives characters that are 
often misclassified as similar characters. If the number of times 
character A is misclassified as character B and the other way 
around is more than a threshold T, A and B are deemed as a 
pair of similar characters. In our experiments with MQDF, 30 
samples are extracted from each class (3755 classes in total) 
for the test, and MQDF outputs 2383 misclassified items. We 
obtain a set of similar Chinese character with 1105 pairs by 
setting T = 2 . Figure gives some examples of our obtained 
similar Chinese characters set. 

For each pair of similar Chinese characters, we train a dis¬ 
criminative SVM to jointly localize the discriminative region 
and distinguish them. We first normalize each sample image to 
the scale of 64 x 64 pixels by employing bi-moment normaliza¬ 
tion method | [T^ and then extract features from sub-windows 
of each sample image by using our proposed Gradient Con¬ 
text Operator, where we set the parameters of the 

neighborhood to be 3,4,8 and 16, respectively. All feature 
descriptors are projected to the formed visual dictionary and 
the locality of each sample character is represented by a coarse 
histogram which counts the number of occurrence of each 
codeword in each sub-window. Considering the efficiency of 
our optimization algorithm, we select 9 scales for the sliding 
sub-window at the space of 4 pixels according to the structures 
of Chinese characters. The selected scales (width, height) are 
as follows: (64, 24), (24, 64), (32, 32), (16, 16), (24, 24), (16, 
48), (48, 16), (64, 32), (32, 64). r in Algorithm|^is empirically 
set to be 0.6. 

C. Evaluation of the Proposed HCCR System 

In this section, we compare our proposed approach with 
the traditional MQDF. Before the comparison, we need to 
configure some parameters in our proposed method first. 

Given a well-trained MQDF, we randomly select 30 samples 
for each class in the CASIA database which contains 112,650 
testing samples in total, and feed them to the trained MQDF 
which outputs the scores of the top two candidates (xi,X 2 ) 
with an indicator y indicating whether the testing sample is 
correctly classified (y = I means correct, y = 0 otherwise). 
Therefore, we obtain a pool of data X = {{x\^ X 2 ,yi)}fLi 
where N is the number of testing samples, and X is used to 
train a logistic function. In order to combine the two classifiers, 
namely, MQDF and discriminative SVM, we need to specify 
an acceptable confidence a for the logistic function. Table I 


Table II 

Recognition accuracy for different HCCR system 



MQDF 

Ours 

Accuracy (%) 

97.89 

98.29 


Table III 

Recognition accuracy of different methods 


Method 

ASU 

MIL 

Ours 

Accuracy (%) 

98.19 

98.00 

98.29 


gives the recognition accuracy of our proposed method with 
different a. 

From Table I, we select the optimal acceptable confidence 
for our logistic regression to be 0.96. Moreover, by comparing 
the recognition accuracy of a = 0.96 and a = 1.00 (working 
without logistic regression), we can deduce that logistic regres¬ 
sion will help boost the recognition accuracy in combining two 
different classifiers. 

With our optimal acceptable confidence determined, we 
compare our proposed HCCR system with SHCCR component 
combined with the traditional MQDF. The experimental results 
are presented in Table [I^ which shows that our proposed 
discriminative SVM for SHCCR can significantly improve the 
recognition accuracy of the HCCR system. 

D. Evaluation of the SHCCR Component 

Most recognition systems with SHCCR component use a 
baseline classifier to output similar characters, and similar 
characters are further classified by a discriminative classifier 
p4| , | [28| . Thus, we evaluate our proposed discriminative 
SVM for SHCCR by comparing it with the other two compet¬ 
itive methods proposed for SHCCR p4| , over the same 
predefined set of similar Chinese characters. For comparison, 
all methods for SHCCR are evaluated under our Logistic 
Regression based framework. We denote the methods proposed 
in and p4| as ASU and MIL, respectively. 

In ASU, the features extracted from all the critical regions 
are supposed to be fed into a LDA classifier for each pair 
of similar characters. However, based on our experiments, the 
calculated covariant matrix may not be positive definite due to 
the high-dimension feature space and limited training sample 
for each pair. So in our implementation of ASU, SVM, a native 
two-class classifier which is free of that constraint, is used to 
replace LDA. For each pair of similar characters, MIL requires 
learning 31 weak classifiers: 
















h( T m q'I'I — / ^)) ^ Pw < X 

n[i,i^[x,y,s)) - I otherwise 

( 12 ) 

where / is an instance, B{x^y^ s) represents a small bag, 
is a threshold and G {1,-1}. However, the optimization 
method for learning the parameters of the weak classifier 
is not specified in | [24| . So it is worth noting that we use 
exhaustive searching to optimize parameters I, B{x,y, s) and 
Pw since they are discrete variables belonging to three finite 
sets. Perceptron Learning algorithm is employed to learn an 
optimal value of by minimizing the following objective 
function: 

f{x) = - ^ distr{i) x yi x (13) 

where O indexes the misclassified samples, yi G {1,-1} and 
distr{i) denote the label and the assigned weight of the ith 
training sample. Stochastic Gradient Descent method is used 
to find a minimal point of f{x). The detailed optimization pro¬ 
cedure is presented in Algorithm 3. We list the experimental 
results in Table |nll 


Algorithm 3: Optimization Procedure for ( p^ 

1 Initialize: 

Tw=t),k = 1,T = 20,1) = {{xi,yi)\l <i<N} 

2 repeat 

3 for 1 < i < do 

4 yi = h{I,B{x,y,s)) 

5 \_Tw =Tw + distr{i)a{yi - yi) 

6 k = k 1', 

1 until k <T\ 


Obviously, our method outperforms the ASU and MIL with 
regard to the recognition accuracy of similar Chinese character 
recognition. ASU first divides each image into 8 x 8 = 64 
small regions and then determine whether a small region is 
critical according to its Average Symmetric Uncertainty. From 
our experiments, we found that the critical regions selected 
by ASU are often dispersive and different with the regions 
that human may percept as critical given a pair of similar 
Chinese characters. This is probably caused by the highly 
variant writing styles in handwritten Chinese characters. While 
MIL cannot localize a certain critical region for each similar 
pair since each weak classifier holds a critical region and 
they are not the same with each other in each similar pair. 
However, our method can explicitly select a critical region 
for each pair of similar characters and it is showed from our 
experiments that the selected critical region generally complies 
with the human perception of critical. Several examples of 
discriminative region localization using ASU, MIL and our 
method are presented in Figure We list the experimental 
results in Table |nll 

Since our approach treats all sub-windows in the negative 
samples as negative, so no detection of discriminative region 
in negative samples should be presented here. 



"W' in similar pair in similar pair "Rf" in similar pair 

m-, -m") (^^Rf ^ 


Figure 5. Localization of discriminative region of pairs of similar Chinese 
characters 

VI. Conclusion 

Focusing on SHCCR, we propose to use the weakly labeled 
data to learn a SVM which localizes the regions with highly 
discriminative ability of similar characters and makes the 
classification simultaneously. To make our method more robust 
to SHCCR problem, we also propose a novel SIFT-alike 
feature descriptor with which we do not need to constrain 
the scale of the sliding window, thus improves the recognition 
accuracy. The experimental results show that our method is 
superior to several competitive approaches and improves the 
accuracy of handwritten character recognition. However, our 
method can still be improved from several aspects. First of 
all, during training the discriminative SVM, the fact that the 
discriminative regions of two similar characters generally lie 
in the same part of the two characters can be utilized to accel¬ 
erate the optimization process. Moreover, the “unconstrained” 
property (no constraints are put on the scale of the sliding 
window) of our SIFT-alike feature is not fully exploited in 
this paper, it can be used to further improve the performance 
of our method. Therefore, our future work will focus on these 
aforementioned points. 
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Appendix 


A. Proof of Theorem [7] 

Proof: At the k-i\v iteration {k > 1) of Algorithm 
it holds that where xl = 


arg min x — 6fc_i}. Let obj{k) denote the ob- 

xe^idl) 

jective value at the k-th iteration. Therefore, we have 


obj{k) = 


< 


< 


n m 

f{gl) + CY.gyxl, gl) + CY^hM. gl) 

i = l i=l 

n 

f{rl-i) + cY9iVhrl-i) 


+CYhM-i,Ck-i) (14) 

i=l 




m 


+CYhM-i,Ck-i) 

i=l 

obj{k - 1) 


(15) 

(16) 


Inequality in is due to Step and Inequality ( p3] ) is 
because of Step So this means that after each iteration, 
the objective of PI along with the sequences generated 
by the algorithm above is non-increasing. In addition, it’s 
straightforward to show that obj(/c) is bounded below by 0. 
Therefore, we conclude that our algorithm converges to the 
local optimum. ■ 


B. Proof of Lemma ^ 

Proof: To prove Lemma we only need to deal with the 

last term in Equation Fix a /i, if max {uj^y -f b} < 

ye^(d-) 

— 1, = 0. In contrast, if max {uj^y b} > —1, 

^ ye^idj) 

hj{ii) = max {uj^y + b}. That is, for each y and = 

ye^(d-) 

arg max {uj^y + 6}, we have for every y : 

ye^(df) 

hjitij ^ {w'Cy* + b 

= uFy f-b f- {w — w)^y"" -h (6 —b) 

= hj{y)p{w -w)^y'"P{b -b) 

It immediately follows that one subgradient of hj{y) is 
~ completes the proof. ■ 

C. Proof of Theorem^ 

Proof: It follows that = oc and a\ < 2. 

Based on Proposition 8.2.6 of Q, the sequences {yu} gener¬ 
ated by Algorithn^ converge to some optimal solutions for 
solving Equation (|^Then the sequence {yk^yj^k} determined 
by Step (3) and (4) in Algorithm must also converge to one 
global optimal solution. This completes the proof. ■ 




